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We present a Maximum Entropy method (MEM) for obtaining dynamical spectra from Quantum 
Monte Carlo data which have a sign problem. By relating the sign fluctuations to the norm of the 
spectra, our method properly treats the correlations between the measured quantities and the sign. 
The method greatly improves the quality and the resolution of the spectra, enabling it to produce 
good spectra even for poorly conditioned data where standard MEM fails. 



I. INTRODUCTION 

One of the greatest advantages of Quantum Monte 
Carlo (QMC) simulations is the possibility to deal with 
complex and large size systems. The tremendous increase 
in computing capabilities and the development of new 
QMC based algorithms in recent years gives rise to new 
opportunities for QMC simulations. However, the pos- 
sibility of producing new data implies a series of new 
problems for processing and analyzing the data. 

Despite the QMC successes, these simulations have 
some general limitations. One such difficulty is the sign 
problem, which affects a large class of quantum models 
and appears when the sampling weight of some config- 
urations is not positive definite. Another drawback of 
QMC simulations is that, while static measurements are 
easily obtained, calculating dynamical quantities is ex- 
tremely difficult. When the sign problem is present, this 
difficulty is much more serious. In the past, the limited 
computing capabilities available didn't allow for simula- 
tions with a small average sign. With the advent of new 
parallel vector machines such as the CRAY XI at ORNL 
however, the speed of these calculations is significantly 
improved, making simulations with a small average sign 
feasible. This necessitates major improvements in the 
methods used to analyze the new data. 

The standard technique of extracting dynamical spec- 
tra from QMC simulations based on the Matsubara-time 
path integral formalism is the Maximum Entropy Method 
(MEM) HEH0. The dynamical properties contain 
important information about excited states and describe 
the system's response to different external perturbations, 
making the direct connection between model and experi- 
ment. Therefore an algorithm able to produce dynamical 
quantities is of crucial importance. The goal of this paper 
is to describe an improved MEM technique of calculating 
dynamical properties of a system from QMC data with 
a sign problem. 

MEM recasts the problem of spectra calculation from 
a deterministic problem to one of probability optimiza- 
tion. In principle, by knowing the imaginary time re- 
sponse functions, the dynamical spectra can be obtained 
by solving an integral equation. However, in practice, 



the calculation of spectra is an ill-defined problem. Due 
to the fact that QMC provides information on a finite 
number of time points with a certain error bar, an infi- 
nite number of solutions consistent with the data exists. 
MEM is an algorithm which, based on Bayesian infer- 
ence [j| , provides the most probable spectrum compatible 
with the available data [(|. 

The spectrum probability is calculated assuming Gaus- 
sianly distributed and uncorrelated data. As the central 
limit theorem requires, the statistics of any average is 
Gaussian as long as the average is taken over a large num- 
ber of uncorrelated points. Methods have been developed 
to reduce the correlations in the data, both those between 
adjacent measurements and those at different Matsubara 
time points in the same measurement 0. 

However, when the sign problem is present, the QMC 
data becomes very poorly conditioned, which greatly 
complicates the MEM problem. Non-Gaussian distri- 
butions and strong correlations of the data turn out to 
be very severe problems. They cannot be removed by 
the standard techniques, mainly due to the strong cor- 
relation between the data and the averaged sign of the 
configurations which produce these data. This makes it 
essentially impossible to calculate spectra long before the 
minus sign problem makes the calculation of static prop- 
erties impractical. In this paper we address this problem 
and describe a solution which greatly increases the reso- 
lution of MEM when calculating spectra from such poorly 
conditioned data. 

This paper is organized as follows. In Sec. [H] we intro- 
duce the general MEM formalism. In Sec. IHII we discuss 
and exemplify the problems which appear when the QMC 
simulations suffer by the sign problem. A solution to the 
problem is given in Sec. IIVI A comparison of spectra 
obtained with the standard and the improved method is 
presented in Sec.[V] The conclusions are given in Sec. lVIl 



II. MEM FORMALISM 

We start with a brief introduction of MEM0. MEM 
is an algorithm which aims to determine the spectral 
decomposition of one- or two-particle Green's functions. 
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Most QMC methods only produce estimates of the imag- 
inary time Green's functions. The relation between the 
spectral density, A(uS), and the imaginary time Green's 
function, G(t), is given by an integral equation 



(1) 



G(t) = J K{r,uj)A{uj)duj 
where the kernel, K(t,u>), is given by 



1 + e~P" 

for the one-particle Green's function, and respectively 



(2) 



K(t,w) = - 



IT 1 



(3) 



for the two-particle susceptibility [T^j . 

The determination of the spectrum is an ill-posed prob- 
lem, since an infinite number of solutions exists which 
are consistent with the QMC data and associated error 
bars. MEM selects from these solutions the most proba- 
ble one. According to Bayesian logic 0, given the data 
G, the conditional probability of the spectrum A, P[A\G], 
is given by 



P[A\G] = P[G\A] P[A]/P[G] . 



(4) 



Here P[G|A] is the likelihood function which represents 
the conditional probability of the data G given A, P[A] 
is the prior probability which contains prior information 
about A and P[G] is called the evidence and can be con- 
sidered a normalization constant. 
The prior probability is given by 



P[A] 



„aS 



(5) 



with a real positive constant a and the entropy function 
S defined by 



S 



du{A{uo) - m{uj) - A{u)]n[A{u)/m{u)}). (6) 



m(oj) is a function called "default model". The specific 
form of the entropy function is a result of some general 
and reasonable assumption imposed on the spectrum, like 
subset independence, coordinate invariance, system inde- 
pendence and scaling. By defining the entropy relative 
to a default model, the prior probability is also used to 
incorporate prior knowledge about the spectrum, such 
as the high-frequency behavior and certain sum-rules. 
In the absence of data the resultant spectrum will be 
identical to the model. The entropic probability and 
its consequences are discussed at large in a series of pa- 
pers pi IM liJ. 1 1 (1 UA\ . and does not constitute the subject 
of this study. 

The main focus of this investigation is the calculation 
of the likelihood function, P[G\A]. The central limit the- 
orem shows that the distribution of the data obtained in 
a QMC process is always Gaussian if every data point is 



taken as an average of a large enough number of mea- 
surements so that different data are independent. This 
implies 



P[G\A] = e 



-x 2 /2 



(7) 



where 



X 2 = E " G i (A))[C- l } ij (G j - Gj(A)), (8) 
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(9) 



with the covariance 



Cij — 



M 



1 M 
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(10) 



Here we considered that QMC provides G(r) on L time 
points Tj, and denote G(tj) = G;. For every time point, 

Ti, we have M measured G\ m \ centered at Gi (Eq.[3J). G 
m Eq. (JTUJ is the covariance matrix which characterizes 
the second moment of the data. Gt {A) in Eq. © is the 
value of G(ri) which corresponds to the spectrum A(lo) 
according to Eq. ^ or its discretized form. 

MEM requires Gaussianly distributed data. Otherwise, 
the likelihood probability defined in Eq. Q has no mean- 
ing. In theory, the requirement for a Gaussian distri- 
bution is achieved by averaging many measurements to 
obtain one data point. However, in practice when com- 
putational resources are limited, this condition is often 
difficult to satisfy. In MEM literature the data points ob- 
tained by averaging many measurements are called bins. 
The usual way to remove the correlation between bins is 
to re-average (coarse-grain) more successive bins which 
results in increasing the number of measurements per 
bin. However, for a fixed amount of data, this process 
of increasing the bin size will reduce the number of data 
points (bins). If the number of bins is too small, the data 
cannot properly describe a statistic process, and the co- 
variance matrix becomes pathological. Therefore a suc- 
cessful MEM for correlated data requires a large number 
of measurements, implying both large bins and a large 
number of bins. 

The correlation of data between different time points, 
is the other relevant problem which causes MEM to fail. 
These correlations can be removed by a rotation U which 
diagonalizes the covariance matrix. 



(11) 



The data and the kernel should also be rotated accord- 
ingly 



K' = U~ l K, G' = U^G 



(12) 
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FIG. 1: Histogram of the distribution of a) Gs(/3/16), b) sign, 
s and c) G(/3/16) when the bin size is increased five times 
which corresponds to 3000 measurements per bin. The dashed 
lines represent the best Gaussian fit to the data. 



The rotated G\ are the statistically independent direc- 
tions, and in this basis, \ 2 reduces to 



i=i 



07 



(13) 



In Eq. E| the discretized form of Eq. ^ was used, the 
summation over v meaning summation over frequency. 



III. DATA PRODUCED BY QMC 
SIMULATIONS WITH SIGN PROBLEM 

When the sign problem is present in QMC simulations, 
the condition of Gaussianly distributed Gi becomes more 
difficult to satisfy. Very often, a huge number of mea- 
surements, beyond the available computing possibilities, 
would be required to accomplish this task. 

The difficulty in obtaining good data points for Gi can 
be easily understood from the measurement process. In 
a QMC process where the sign of the sampling weight is 
negative, it can no longer define a probability. Therefore, 
the sign of the sampling weight must be associated with 
the measurement. For the Green's function, we can no 
longer measure Gi but rather the product of it and the 
sign s of the configuration, Gst = Gi * s, and the sign 
s. At the end of the simulation, i.e. after a large num- 
ber of measurements, we then obtain Gi = Gsj/s, where 
the overbar denotes averaging over the number of mea- 
surements. Two problems related with the sign affect the 
quality of the Gi data. First, in order to obtain good data 
points G\ m \ for every data point we need to average a 
very large number of Gsi and s and afterwards calculate 
G l (m) = G Sl (m) /s (m) - Here Gs^ and «("0 both denote 
averages over the measurements that form the bin m. 
Smaller average signs s worsen the problem, since any 
small variation of s has a large effect on Gi (oc 1/s 2 ). 



Second, as within the same bin m there is a strong corre- 
lation between different data points Gs\ , there is also 
a strong correlation between data points Gsf 1 ^ and s^ m \ 

The points G^ = Gs\ / are obtained by a nonlin- 
ear operation of these correlated quantities, and there is 
no reason to expect them to be normal distributed. 

In order to exemplify the problems discussed above, we 
employed a QMC based algorithm to produce a very 
large amount of data for the single-particle Green's func- 
tion and the two-particle spin susceptibility in the two- 
dimensional Hubbard model on a square lattice. The 
Hubbard model is characterized by the single-particle 
hopping t between nearest neighbors and the on-site 
Coulomb repulsion U. We choose t — 0.25 so that the 
bandwidth W = 2 and set U = W. To make the sign- 
problem worse, we add a next-nearest neighbor hopping 
t' = — 0.3i to frustrate the lattice. We perform calcu- 
lations on a 16-site 4x4 cluster at 15% doping, down 
to temperatures T = 0.125£ where we experience a se- 
vere sign-problem, s = 0.051. We simulate the model us- 
ing the dynamical cluster approximation (DCA) with the 
Hirsch-Fye algorithm as a cluster solver. |12j The DCA is 
a coarse-graining approximation, in which the one par- 
ticle Green's function is coarse-grained in the first Bril- 
louin zone of the reciprocal space of the lattice. It is 
defined over cluster points K = (K x ,K y ) and imagi- 
nary time t, and accurately describes short-ranged cor- 
relations. We performed the simulations on the Cray 
X\ supercomputer at Oak Ridge National Laboratory to 
cope with the large amount of data needed in simula- 
tions with small average signs. We calculated 8000 data 
points (bins) (Gs| m ',s^), and for every data point we 
averaged 600 QMC measurements. 

In Fig. ^ we show histograms of the data distribution 
when the bin size is increased five times, which corre- 
sponds to an average of 3000 measurements per bin. Both 
Gs(/3/16) (Fig. [2(a)) and s (Fig. 12(b)) are normally dis- 
tributed to a good approximation, unlike G(/3/16) data 
points (Fig.n(c)) which are strongly pea ked, being char- 
acterized by a large positive kurtosis [Lj . Similar distri- 
butions of data are observed (not shown) for the other 
values of the imaginary time. In order to become Gaus- 
sianly distributed the G data require averaging over a 
much larger number of measurements than Gs and s 
data. In our case this number is about five times larger 
but this value is dependent on the specificity of the prob- 
lem considered, being determined by both the magnitude 
of the correlations and the value and the distribution of 
the sign [l5| . 

The way to achieve good data for MEM is i) rebinning 
(Gsi,s) until they become normal distributed, and ii) 
remove the correlations between data points Gsi and s by 
a rotations in the space (Gsi, s). However, the problem 
that arises is the calculation of \ 2 fEa.l5 HT3*|) in this basis 
which now includes the extra sign dimension. This issue 
will be discussed in the next section f Sec. II Vjl . 
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IV. LIKELIHOOD FUNCTION 
A. Formalism 

Denoting h = (Gs, s), the likelihood function is defined 
as P[/i|A], since the measured quantities in the QMC 
process are the h points (and not G). As we showed 
in the previous section, for acceptable values of the bin 
size, the data h are to a good approximation Gaussianly 
distributed. Therefore, the likelihood function will have 
the same form as Eq. with \ 2 



L+l 



hi(A))[C^ 1 ] l j(hj - hj{A)) 



(14) 



The covariance matrix has now the dimension (L + l) x 
(L + l), 
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FIG. 2: One-particle spectra at K = (n,n/2) calculated with 
different amounts of data using a) the new method and b) the 
old method. 
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B. Discussion 



The only problem which remains to be solved is finding 
an equation for h(A), since Eq.^only provides a relation 
for G(A). In order to achieve this we do the following: 
First we absorb the sign into the spectrum, i.e. we define 
A as 



A(u>) = sA(lo) 



(16) 



Instead of searching for a spectrum A which satisfies 
Eq. \T\we search for A which satisfies 



Gs(t) 



K(r,uj)A(uj)dLU 



(17) 



Second, we consider the spectrum normalization sum- 
rule 



B = / A(u)dw, 



which implies 



B 



A(uj)du> . 



(18) 



(19) 



Here B is a constant, equal to one for the the one-particle 
spectra and equal to x(T) f° r the two-particle case [l6j . 
Because the sign s was absorbed into the definition of A 
we relate the sign fluctuations to the norm of the new 
spectrum. Both Eq. 1171 and Eq. ^|can be written as 



We want to point that for the one-particle case, where 
B = 1, Eq. ljT§|) is equivalent to 



Gs(0) + Ga(P) = s 



(21) 



By using Eq. 119(1 in the calculation of the likelihood 
function we impose 



G(0) + G(/3) 



Gs(0) Ga{0) 



1- 



(22) 



at every measurement. Since Eq. [53 results solely from 
the anticommutation relation of the one-particle opera- 
tors it should be satisfied in every possible configuration 
and implicitly in every measurement. Therefore, this way 
of implementing the normalization sum-rule is more nat- 
ural than the usual way based on Lagrange multipliers 
where the constraint is globally imposed, i.e. not at ev- 
ery measurement but only for the final Green's function 
obtained at the end of the QMC process. 

For the two-particle case, where B = x(T), the sum- 
rule Eq. I(18|) is not an independent equation as in the one- 
particle case, but merely an integration over r of Eq. ^ 
Therefore it is essential to treat B as a constant (equal 
to the final, averaged over all QMC configurations, x(T)) 
and to disregard measurement dependent fluctuations in 
x(T). This way we relate the norm of A only to the 
fluctuation of the sign s. 



E 



i v >A v , K}i 



B 



i < L 
i = L 



(20) 



This is the basic equation which relates h to A and de- 
termines the likelihood function P[h\A] = P[h\A]. MEM 
will produce the most probable spectrum A normalized 
to s which minimizes the \ 2 function in Eq. ((14(1 subject 
to the entropy constraint. 



V. COMPARISON OF THE SPECTRA 
OBTAINED WITH THE TWO METHODS 

In this section we present a comparison between the 
spectra obtained with the old approach which does not 
consider the sign covariance, and the new one described 
in Sec. IIVI For calculating the one-particle spectrum at 
the highest temperatures, the model m(u) used in the 
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FIG. 3: Two-particle spin susceptibility spectra A(lu) — 
x"{ui)/u> at K = (0, 7r/2) calculated for different amounts 
of data with a) the new method and b) the old method. 
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FIG. 4: Imaginary part of the spin susceptibility x" [ u ) ~ 
A(u))lj at K — (0, 7r/2) calculated for different amount of 
data with a) the new method and b) the old method. 



entropy functional Eq. (JjJ, is chosen to be a Gaussian 
function. The model for lower temperatures is taken to 
be the spectrum obtained at a slightly higher tempera- 
ture, a procedure called annealing. 

In Fig. |21 (a) and (b) we show the one-particle spec- 
tra of the Hubbard model at K = (w,tt/2) calculated 
for different amounts of data with the new and respec- 
tively with the old method. In both cases, when a large 
amount of data is used (8000 data points) the spectrum 
(thick continuous line) is converged. Moreover the two 
methods produce the same spectrum. However, it can 
be noticed that with the new method a reasonably good 
spectrum, i.e. a spectrum close to the converged one, can 
be obtained with an amount of data as small as 100 data 
points (see the double-dotted dashed line in Fig. 121(a)). 
On the other hand, the old method requires at least 600 
data points for a spectrum of comparable quality (see 
the dotted line in Fig. 0(b)). Thus in our case we find 
that the new method reduces the computational cost of 
calculating the one-particle spectra about six times. 

In general the calculation of the two-particle spectra 
turns out to be more difficult, because the data are more 
correlated and because a good default model is missing. 
In our case the amount of data needed for calculating 
the two-particle spectra is about one order of magnitude 
larger. At high temperature we choose a default model 
of the form 

m(u) = exp[A + X%u) coth(/3w/2)] (23) 

where Ao and Ai are Lagrange multipliers chosen to sat- 
isfy certain moment constraints, as described in ref 0. 
Again, the annealing technique is used for lower temper- 
ature calculations. 

The spin susceptibility spectra at K = (0,7r/2) cal- 
culated with the two methods for different amounts of 
data is shown in Fig. [3J For the two-particle case the 
spectra A(K,u>) defined in Eq. ^is in fact \" 



where x"(JC,cj) is the imaginary part of the spin sus- 
ceptibility. When a large amount of data is used (8000 
data points) both methods produce the same spectrum 
(thick continuous line in both Fig. |3](a) and (b)). How- 
ever, for small amounts of data, the new method produces 
significantly superior results to the old method. For ex- 
ample the spectrum obtained with the new method for 
1000 data points (thin continuous line in Fig. |3| (a)) is 
closer to the converged spectrum than the one obtained 
with the old method for 4000 data points (dashed line in 
Fig. 13 (b)). The same conclusion can be drawn by com- 
paring the high energy features (ps 0.5 — 1.5 eV) visible in 
the plot of the imaginary part of the spin susceptibility 
x"(K,lo) in Fig.H(a) and (b). 



VI. CONCLUSIONS 

We showed that for QMC simulations with a severe 
sign problem, achieving a normal distribution of G is ex- 
tremely difficult. The problem results from the nonlinear 
operation which relates G to the measured quantities Gs 
and s, and from the correlation between the Gs points 
and s. 

By absorbing the sign into the definition of the spec- 
trum, the sign fluctuations will determine the norm of the 
spectrum. A connection is thus established between the 
measured quantities (Gs, s) and the renormalizcd spec- 
trum A — As. The likelihood function is calculated with 
regard to the directly measured (Gs, s) data, thus no 
nonlinear manipulation of the data is being applied. The 
correlations between Gs and s can be removed by a ro- 
tation in the space determined by these vectors. 

We illustrated the power of this approach by a com- 
parison of the spectra obtained with the old and the new 
method. When the sign is small and the correlation be- 
tween the sign s and the measured Gs points is signif- 
icant, the old method requires a very large amount of 
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measurements per bin in order to produce the normally 
distributed and uncorrelated data points necessary for 
obtaining good spectra. In contrast, the new method 
provides good spectra for a much smaller amount of data. 
In our case the old method needs about six times more 
data than the new one, but for other problems charac- 
terized by stronger correlations this amount can be much 
larger. 
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